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Abstract — This paper considers the problem of compressive 
sensing over a finite alphabet, where the finite alphabet may be 
inherent to the nature of the data or a result of quantization. 
There are multiple examples of finite alphabet based static as 
well as time-series data with inherent sparse structure; and 
quantizing real values is an essential step while handling real 
data in practice. We show that there are significant benefits to 
analyzing the problem while incorporating its finite alphabet 
nature, versus ignoring it and employing a conventional real 
alphabet based toolbox. Specifically, when the alphabet is finite, 
our techniques (a) have a lower sample complexity compared 
to real-valued compressive sensing for sparsity levels below a 
threshold; (b) facilitate constructive designs of sensing matrices 
based on coding-theoretic techniques; (c) enable one to solve the 
exact i?o -minimization problem in polynomial time rather than a 
approach of convex relaxation followed by sufficient conditions 
for when the relaxation matches the original problem; and finally, 
(d) allow for smaller amount of data storage (in bits). 

Index Terms — compressive sensing, finite alphabet. 

I. Introduction 

Compressive sensing has witnessed an explosion of research 
and literature in recent years. It has found useful applications 
in diverse fields, ranging from signal processing and micro- 
arrays to imaging and computer vision (T], fl], f3], f4]. The 
theory behind compressive sensing permits the sensing and 
recovery of signals, that are "sparse" in some domain, using a 
small number of linear measurements, roughly proportional to 
the number of non-zero values the signals take in the sparse 
domain 151, ||6l, Q. To be precise, a real-valued n-dimensional 
signal, with a 6-sparse representation in some basis, can be 
captured using m — 0{blog{n/b)) measurements based on 
linear combinations of the signal values (b, n, m £ N, b < n). 
As such, compressive sensing finds its utility in setups where 
there is an inherent sparse structure in the nature of data, and 
storing or collecting measurements can be expensive. 

There are multiple practical algorithms for near-perfect 
recovery of real-valued sparse signals from their linear mea- 
surements, in the presence or in the absence of noise |8|, 
JSl, IM, IDl, fT2l, fT3l. These algorithms tend to either be 
based on linear programming (like basis pursuit and Lasso) or 
low complexity iterative techniques (like orthogonal matching 
pursuit). Regardless of the algorithmic framework, a common 
underlying feature of real-valued compressive sensing is a 
property of incoherence in the sensing matrix corresponding 
to linear measurements (e.g., RIP), that serves as a sufficient 



condition for accurate reconstruction of sparse signals fT4\, 
|15|. From an analytical perspective, there is a large and 
growing body of literature on the necessary and sufficient 
conditions for accurate recovery of sparse signals. 

In practice, signals are not always real-valued. For ex- 
ample, opinion polls, ranking information, commodity sales 
numbers, and other counting data sets including arrivals at a 
queue/server are inherently discrete-valued. Moreover, some of 
what might otherwise be regarded as continuous -valued data 
sets are conventionally "binned" into finite alphabet sets. This 
includes rainfall data, power generation data and many other 
examples where quantized values are of interest as the output 
of the sensing process. In such cases, knowledge of the nature 
of the alphabet can prove to be useful, which together with the 
underlying sparsity property can lead to alternate and efficient 
algorithms for finite alphabet compressive sensing. 

In this paper, we consider a setup where the sensed informa- 
tion belongs to a known finite alphabet. We treat this alphabet 
as a subset of a suitable finite field, and make use of the 
field structural properties for compressing and reconstructing 
sparse signals. This approach enables us to use tools from 
algebraic coding theory to construct sensing matrices and 
design efficient algorithms for recovering sparse signals. In 
this process, we also build a deeper connection between the 
areas of algebraic coding theory and compressive sensing than 
what is currently understood in literature lfT6]| . ifTTj . 

Our main application domain in this paper is in tracking 
discrete-valued time-series data. For example, consider the 
time-series data corresponding to the backlog in a set of queues 
in a discrete-time system. This backlog is typically discrete- 
valued, and we assume the change in backlog from one time 
instance to the next has a sparse nature. In many queuing 
applications, it is practically infeasible to exactly measure 
and store the backlog in each queue due to extremely short 
timescales; therefore, an efficient compression mechanism for 
sensing and storage is desirable. Our goal is to track this 
time-series accurately. Our benchmark for comparison is real- 
valued compressive sensing, i.e., storing real-valued linear 
combination based measurements and using a convex relax- 
ation approach for recovering the sparse differences between 
successive time instances. In Section |V] we compare this 
approach to the one presented in this paper, and find that for 
the same number of samples, the real-valued approach suffers 
from error accumulation as the time-series progresses, while 
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the finite alphabet approach tracks the values exactly. 

A. Motivation 

In case of real-valued compressive sensing, the recovery of 
sparse signals from linear measurements reduces to solving 
a ^o-rninimization problem. To be precise, given 6-sparse 
vector X S M"^^, sensing matrix A e jjmx" ^nd vector o 
measurements y = Ax, x can be recovered from y by solving 

min||x||o s.t. y = Ax. (1) 

However this optimization problem is non-convex and known 
to be NP-hard for known meaningful constructions of sensing 
matrices. Therefore, a standard relaxation approach used in 
compressive sensing is to replace i'o-norm by ^i-norm, i.e., 
solve the convex optimization problem of i?i-minimization. A 
sufficient condition for exact recovery via i?i-minimization is 
that A satisfies incoherence properties such as RIP. 

Typically, finite alphabet problems are more difficult or 
complicated to solve than their real-valued counterparts. For 
example, finding the logarithm is a simple inversion for real 
values but NP-hard over finite fields. This difficulty forms 
the basis of the well-known Diffie-Hellman key exchange 
algorithm, as do other hardness guarantees over finite fields. 
In our case, we have an interesting inversion of facts. ([T]i is 
NP-hard for most sensing matrices of interest in compressive 
sensing over reals. However, using algebraic coding theory, 
we can design sensing matrices A that are 'good' for sensing 
as well as enable ([TJ to be solved in polynomial-time. 

Another important reason for finite alphabet analysis of 
compressive sensing is storage space. Real-values are analyt- 
ical abstract artifacts, and are a blessing for mathematicians 
and applied mathematicians alike, but in practice, values must 
be stored and processed in form of discrete alphabet sets. The 
requirement for storage space (in bits or any other unit) is 
an important area of concern for any compression framework, 
and applications of compressive sensing are no exception. We 
show that our methodology not only affords a lower sample 
complexity (in symbols) under certain settings, but it also has 
a lower storage requirement in terms of bits of information to 
be stored for exact reconstruction of sparse signals. Finally, 
although relaxations of optimization problems are immensely 
useful and mathematically elegant, being able to solve the 
original problem in polynomial time is always desirable. This, 
for example, enables accurate tracking of sparse-structured 
discrete time-series data using finite alphabet compressive 
sensing, while avoiding error accumulation with time. 

B. Related Work 

The fact that real-valued compressive sensing allows for 
recovery of sparse signals based on linear measurements is 
reminiscent of error correction in linear channel codes and 
compression by lossless source codes over finite alphabet or 



fields ifTSl . lfT9l . Such similarities have been identified in 
existing literature to serve varied goals. For example, the use of 
bipartite expander graphs to design real-valued sensing matri- 
ces is investigated in 1201 . The connection between real-valued 
compressive sensing and linear channel codes is explored in 
pTl, by viewing sparse signal compression as syndrome-based 
source coding over real numbers and making use of linear 
codes over finite fields of large sizes. The design of real- 
valued sensing matrices based on LDPC codes is examined 
in II2TI and ||22| . The connection between sparse learning 
problems and coding theory is studied in 1231 . For real- 
valued compressive sensing over finite alphabet, the sparse 
signal recovery approaches that have been examined include 
approximate message passing [24 i . sphere decoding and semi- 
definite relaxation ||25]| . However, an algebraic understanding 
of compressive sensing, particularly over finite fields, is yet 
limited, which is the main contribution of this paper 

C. Main Results 

We use an algebraic framework for analyzing the recovery 
of sparse signals/vectors based on finite alphabet, given the 
set of linear measurements. We show that m — 0(6 [log^ n]) 
measurements are sufficient for exact recovery of any 6-sparse 
n-dimensional signal based on a finite alphabet of size q - this 
is smaller compared to the number of measurements needed 
for real-valued compressive sensing for non-trivial ranges of 
b,n,q (which ensures sparsity is below some threshold) |14|, 
[15|. We describe a coding-theoretic approach for construct- 
ing the sensing matrices. Note that this is a straightforward 
application of coding-theoretic principles, and thus the math- 
ematical concepts are by no means new - what makes it 
interesting is its connection and relevance to finite alphabet 
compressive sensing. We show the versatility and applicability 
of our approach to the case of noisy measurements, for 
both probabilistic and worst-case noise models. We apply our 
approach for tracking discrete-valued time-series with sparse 
changes, based on synthetic and real-world data; we find 
that, for the same number of samples, our approach performs 
accurate tracking while the real-valued approach accumulates 
error and drift in values as the time-series progresses. 

We wish to emphasize that, even if the mathematical tools 
we use are known, our results are not obvious - for example, 
there is no obvious reason why the bounds on sample complex- 
ity for the discrete case should be different (or better) than that 
of the real-valued case. Indeed, intuition suggests that discrete 
analysis should always be inferior in sample complexity, as 
discrete is a special case and more restrictive than its real- 
valued counterpart. Our contribution lies in the application of 
these known tools to finite alphabet data with sparse structure, 
and realizing that, for certain cases, lossless compression is 
possible with fewer samples and lesser storage space than its 
real-valued counterpart, and that it finds natural appUcation in 
tracking discrete time-series data with sparse changes. 
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Fig. 1: Block diagram for compressive sensing process. 



The rest of the paper is organized as follows. We describe 
the preliminaries in Section |ll] that introduces the problem 
setup and provides background on the relevant algebraic 
concepts used for analysis. We examine the problem of finite 
alphabet compressive sensing for the cases of noiseless and 



noisy measurements in Sections III and IV respectively. We 
demonstrate the simulation results based on our approach in 
Section |Vj and conclude the paper with Section [Vl] 



Given the vectors in Sq are chosen according to an uniform 
distribution, the application of source coding theorem (lE^ 
states that the number of measurements required to character- 
ize X £ Sq is at least log2 \Sq\ = log2Ej=o - 1)^] 
fl{b\og{n/b)) for b < n/2. In this paper, we provide schemes 
for designing m — 2b\\og^ n] measurements that permit exact 
and efficient recovery of x from y. This matches the lower 
bound on the number of measurements, stated above, in order- 
wise sense, for the scaling case b = 0{n°'), a G [0, 1). 

A critical algebraic tool that we utilize for designing the 
sensing matrices is field lifting, where we transform the prob- 
lem from the original finite field to a suitable field extension. 
We anticipate the audience for this paper may not necessarily 
be familiar with the algebraic concepts used in this paper. 
Therefore, to enhance its readability, we set the stage by 
introducing the concepts of extension fields and field lifting 
based on extension fields in the subsequent sections. 



II. Preliminaries 

Notation: We use ¥q to represent the finite field with q 
elements, where q is a prime number or power of a prime. 
For any field F, we use ¥[x] to denote the polynomial ring 
in variable x with coefficients from F. For n e N, x € F", 
we use wt(x) to denote the number of non-zero elements in 
X (here zero refers to zero element in F). For a; e M, we use 
[a;J and \x~\ to represent its floor and ceiling values. 

Given b,n,q £ N with b < n, and a finite alphabet ^ C M 
with G A and \A\ = q, we consider the following ensemble: 

5 = {x = {xi,X2, ■ ■ .,Xn) e yl" : wt(x) < b}. 

This ensemble represents the space of n-dimensional signals 
that are at most 6-sparse with entries coming from A. We 
assume that g is a prime number or its power, and consider a 
bijective mapping (f) : A ¥q with the restriction (/)(0) = 0, 
i.e., e M gets mapped as the zero of ¥q. This allows us to 
interpret A as ¥q and we define the following set of vectors: 

5, = {X = (0(zi), . . . , 0(Z„)) e : (21, • • • , 2n) e S} . 

By construction, the vectors in Sq are at most ^-sparse. 

We wish to develop a framework for efficient compression 
of any x. G Sq. In precise terms, we desire to reconstruct 
X G Sq using minimal number of measurements that are linear 
combinations of its elements, based on field operations and 
coefficients from F^. This measurement process is given by 

y = Ax + n, (2) 

where A G F™^" is the sensing matrix, n G F™ is the 
measurement noise and y G F™ is the measurement vector. 
The process is depicted as a block diagram in Figure [T] Note 
that y can be interpreted as a noisy and compressed version 
of X. The overall goal of the problem setting is to design A 
such that X can be recovered accurately and efficiently. 



A. Background: Extension Field 

We start with the definitions of extension field and subfield: 

Definition II.l (Extension Field). Let ¥ and K be fields such 
that F C K. Then K is called an extension field of ¥ (also 
denoted by K/Fj and ¥ is called a subfield o/K. 

As an example, the field of complex numbers C is an 
extension of the field of real numbers M, constructed using 
root i of the irreducible polynomial a;'^ + 1 over M, as 
€- = {x + iy : x,y G M}. In general, irreducible polynomials 
and their roots play a pivotal role in the generation of extension 
fields from their base fields; therefore, we define them next: 

Definition II.2 (Irreducible Polynomial). Given a field ¥, a 
polynomial p{x) G ¥[x\ that is divisible only by cp{x) or c, 
c g¥, is called an irreducible polynomial over ¥. Also, if the 
coefficient of the highest degree term is equal to \ G ¥, the 
polynomial is called monic. A monic irreducible polynomial 
with non-trivial degree is called a prime polynomial. 

The extension field ¥qs, s G N, can be constructed from F^ 
using the root of any irreducible polynomial of degree s over 
¥q. It is known that the set of nonzero elements in F^a form a 
cyclic set, i.e., there exists at least one a G ¥qs (called a prim- 
itive element of F,s) such that F^a = {0, l,a, , . . . 
and a' ~^ — 1. The existence of primitive elements motivates 
the concept of primitive polynomial that is defined as follows: 

Definition II.3 (Primitive Polynomial). Given a field ¥, a 
primitive polynomial over ¥ (and in ¥\x\) is a prime polyno- 
mial over ¥ having a primitive element of an extension field, 
say K, as one of its roots, as a polynomial in K[a;]. 

Using a primitive polynomial p{x) of degree s over F^ 
allows its root to become a generator for the extension field 
¥q3, s G N. Therefore, Fgs can be viewed as the set of 
polynomials over F^ modulo p{x). Then ¥qs can also be 
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TABLE I: Fg as an extension field of F2 
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viewed as a vector space of dimension s over F^, with the basis 
set {1, a, Qf^, . . . , a*~^}, where a is the primitive element of 
Fgs such that p(a) = in F^s. As an example, the primitive 
polynomial p(x) — + x + \ over F2 and its primitive root 
a in Fg can be used to generate the elements of Fg. This 
field extension process is depicted in Table [l] The existence of 
primitive polynomials is confirmed by the following lemma: 

Lemma ILL Given a finite field F^ and s G N, there are 
Lp{(f — 1) / s primitive polynomials over Fg that generate the 
extension field F^s. is referred to as the Euler totient func- 
tion -for n G N it is defined as (p{n) = f^HiLi ~ p7^)' 
where pi,p2, ■ ■ ■ ,Pk cire prime numbers that divide n. 



B. Background: Field Lifting 

The idea of field lifting allows one to re-interpret a com- 
pressive sensing problem over a particular finite field in one 
of its extension fields. The motivation for doing so is that 
extension fields have larger number of dimensions and offer 
more degrees of flexibility compared to their base fields. 
Referring to the system model in field lifting is performed 
as follows. Given s G N, we consider a primitive polynomial 
p{x) of degree s over Fg and its root a, that is a primitive 
element of Fgs . Note that Fgs can be viewed as a vector space 
of dimension s over Fg on the basis set {1, a, a^, . . . , a*"^}: 



c,a : Co,Ci, 



,cs-i e Fg 



We assume that the number of measurements m satisfies m = 
m's, m! G N. This allows us to define the following mappings: 
. 0,: Given C = [c,,] e¥J^^\ 0,(C) = [c',,] e F™'><" 
is defined as c'j,; — J2t=o C(k-i)s+t+i,ict* ■ Thus, the fcth 
row of (f>s{C) is obtained by scaling the ((fc— l)s+i+l)th 
row of C by a*, < i < s, and summing them up. 
. v.: Given c - [c,] e F^", ^,(c) = [c'J G Fg-' is 
defined as c'j, = J2t=o ^{k-i)s+t+i'^* ■ Thus, the kth entry 
of ips{c) is obtained by scaling the ((fc — l)s + t + l)th 
entry of c by a*, < t < s, and summing them up. 
Note that fixing p{x) and a imparts a unique algebraic struc- 
ture to Fgs in terms of the elements of Fg and a. As such the 
mappings 0^ and ips are bijective functions, i.e., their inverse 



mappings : F^ ^" ^ F™^" and • ^ g= - ^g 
exist and are well-defined. Assuming measurement noise to 
be absent in the system (i.e., n = 0), the system model in (j2]) 
can be restated in terms of (j>s and V's, over Fgs, as follows: 



(3) 



Note that x e F^' C F^'. . This demonsti-ates field hfting of the 
system model from Fg to Fgs. We make use of this concept 
for designing sensing matrices for compressive sensing over 
Fg in the subsequent sections, using mappings (j>s and i/jg for 
some s G N, obtained by fixing a primitive polynomial of 
degree s over Fg and choosing a primitive root in Fgs . 

III. Noiseless Measurements 

In this section, we analyze the problem of recovering x e iSg 
in the absence of noise, i.e., n = 0. Then we have the relation 



y = Ax. 



(4) 



Note that this situation resembles the process of syndrome 
decoding in linear codes from coding theory, where x, y and 
A play the roles of error vector, syndrome vector and parity 
check matrix of the linear code respectively fT7\ . It is this 
connection to linear codes that we exploit for designing A and 
algorithms for recovering x from y. We refer to a linear code C 
as an [N, K, D]q code (N, K,D G N and q is a prime number 
or its power) if the code alphabet is Fg, codeword length is 
N, number of codewords is , and the minimum Hamming 
distance between codewords is D (i.e., at most [{D — 1)/2J 
errors can be corrected). Then the following theorem holds: 

Theorem III.l. Given m = m's, m! ,s G N, it is possible to 
exactly recover x d Sq from y if <j)s (A) is the parity check 
matrix of a [n, n — m' , d]qs linear code with d > 2b. 

Proof: We apply mapping ips to obtain ipsiv)- As de- 
scribed in Section |II-B| this performs indirect field lifting of 
the setup from Fg to Fga, and relation ^ holds. Since 0s (A) 
is the parity check matrix of a [n,n — m',d]qs linear code, 
ips (y) acts as a syndrome vector and x acts as the error vector 
generating it. Therefore, syndrome decoding can be used to 
exactly recover x from y, since the vectors in Sq are equally 
likely to occur (uniform distribution over Sq), wt(x) < b and 
the code can correct up to [{d — 1)/2J > b errors. ■ 
There exist structured linear codes with efficient algorithms 
for syndrome decoding - one example is the family of Reed- 
Solomon codes IIZTII . Designing 0<;(A) as the parity check 
matrix of a Reed-Solomon code gives the following corollary: 

Corollary III.2. Given m — 2bs, s £ N, s > [loggTi], it 
is possible to exactly recover x g iSg from y using 0{nbs^) 
operations in ¥q if n > 2b and (ps (A) is the parity check 
matrix of a [n, n — 2b, 26 + l]gs Reed-Solomon code. 

Proof: The recovery of x from y follows from Theorem 
III. 1 1 with m' — 2b. Note that we require s > \\og„ n] , 
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since the alphabet size should not be less than the codeword 
length for a Reed-Solomon code, i.e., q'' > n. Also, there 
exist multiple algorithms for syndrome decoding in Reed- 
Solomon codes, such as Euclid's algorithm based decoding 
or Berlekamp-Massey algorithm, that can reconstruct x from 
y using 0{nb) operations in F^s f2n\. Since F^s is a vector 
space over Fg, multiplication (and addition) of elements in 
Fgs means multiplication (and addition) of polynomials of 
degree less than s over F^ modulo a primitive polynomial. This 
implies that a field operation in F^s is equivalent to O(s^) field 
operations in F^. Overall, this amounts to 0{nbs^) operations 
in ¥q for any of the syndrome decoding algorithms. ■ 
Note that Reed-Solomon codes is one family of codes that 
can be used for designing sensing matrices. In general, any 
family of linear codes that admits a polynomial time syndrome 
decoding algorithm can be used for constructing sensing 
matrices. Examples include BCH codes and special classes of 
LDPC codes, like expanders. Also, Q can be thought of as a 
source coding problem where a vector needs to be compressed. 
LDGM codes is a family of good source codes that enable 
efficient vector compression |28|. The generator matrix of 
these codes have the interesting property that vectors close to 
each other in Hamming distance get mapped to compressed 
versions that are close in terms of Hamming distance as well. 
This is, in some sense, analogous to the property of RIP, where 
low-dimensional projections of vectors, close to each other in 
f 2-iiorm sense, are also mapped close to each other 



Number of measurements: Corollary III. 2 suggests that 
one can design m = 2b \\ogg n] measurements (by choosing 
s = [log^ n] ) for recovering x £ Sq in the noiseless setting, 
using 0(n&(logn)^) field operations in ¥g. This scaling of m 
is order-wise optimal for b = 0{n'^), a S [0, 1), with respect 
to the information-theoretic lower bound of 57(6 log (n/6)). A 
sufficient condition that ensures -minimization gives accu- 
rate recovery in the real-valued framework is that the sensing 
matrix satisfies RIP of order 2b with parameter 52b < {V2 — 1) 
ifTOl . A convenient way of generating RIP matrices is by 
choosing its entries from a sub-Gaussian distribution in an 
i.i.d. fashion. Given 6 € {V2 — 1, 1) and any ki > 0, if 
m > 2Ki61ogg(ri/26), then exact recovery is possible using 
^i-minimization with probability > 1 — 2 exp(— K2'7i), where 
K2 = 5V2K*-logg(42e/5)/Ki and k* = 2(l-logg2). There- 
fore, for b < 0.5(7-(''i'°s='?)"'ni~('"ii°s.9)"' (i.e., sparsity 
below some threshold) the number of measurements required 
for the finite field/alphabet framework is smaller. 

Storage space: The storage space needed for the measure- 
ment vector is at most 2b\\ogg n] logj q bits, that lies between 
2&log2n and 261og2n + 261og2g bits. The storage space 
taken by real-values is in theory, infinite, and in practice, 
with j-hit quantization, is linear in j. For the case of real- 
valued compressing and the same number of measurements, 
this amounts to 2jb\\ogqn] bits of storage space for the 
measurement vector. Note that j > log2 q, since at least 



log2 q bits are needed to resolve among the elements of 
finite alphabet of size q (even if they are treated as real 
numbers). This gives storage space of at least 26[log^ n] log2 q 
bits. Therefore, the storage requirement for the finite field 
framework is smaller compared to the real-valued framework 
- also, the ratio of the number of bits needed for the algebraic 
approach vs. the real-valued approach is w log2 q/j. 

Thus, the algebraic approach offers benefits in terms of 
lesser number of measurements and storage space (in bits), 
provided the sparsity levels are below some threshold. 

IV. Noisy Measurements 

In this section, we analyze the problem of recovering x G 
Sq in presence of noise; the purpose being to demonstrate 
the versatility and utility of coding-theoretic tools for finite 
alphabet sparse signal recovery. We consider two noise models 
for analysis - probabilistic noise and worst-case noise. 

A. Probabilistic Noise Model 

The probabilistic noise model is widely used for model- 
ing errors resulting from transmissions across communication 
channels. For this model, we assume that n is generated 
according to a probability distribution. For the sake of sim- 
plicity, we consider n being generated by m independent 
uses of a g-ary symmetric channel with crossover probability 
A G (0, 1 — q~^) (similar analysis can be done for noise 
with general probability distributions). In other words, if 
n = [ni ?i2 • • • Tim]'^, rii has the probability distribution 
P{n, = a) = X/{q - 1) for a G ¥q\{0} and 1 - A for 
a = 0. Here, we recover x from y in two steps. First, we 
eliminate the effect of errors introduced by n, using error 
correction capability of linear codes, and obtain a compressed 
version of x, with high probability. Next, we retrieve x from 



this compressed version, as described in Section III 



We say that a linear code C achieves probability of error 
of at most over a channel if maxcgc Pe{c) < Pe, where 
Pe(c) refers to the probability that codeword c is decoded 
erroneously by a nearest-neighbor-codeword decoder, condi- 
tioned on the fact that c was originally sent across the channel. 
We also define Hq{x) = — xlog^ x — {1 — x) logg(l — x) + 
x\ogq{q — 1), x e (0, 1). Then the following theorem holds: 

Theorem IV.l. Given A G (0, 1 — g^^), m ~ an' > cm" s, 

c, m', m", s G N, c > 1/(1 — Hq{X)), it is possible to exactly 
recover x G 5g from y with probability > ( 1 — ) if A. = 
GA', where G is the generator matrix of a [rn, m' ,d]q linear 
code, achieving probability of error of at most Pe over q-ary 
symmetric channel with crossover probability A, and some set 
of m" s rows of PJ forms A" such that 0s(A") is the parity 
check matrix of a [n, n — to", d'Jgs linear code, d' > 2b. 

Proof: Using the given form of A, we have y = 
GA'x + n. Since G is the generator matrix of a [m, m's, d]q 
linear code, GA'x can be treated as a codeword, y as its 
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noisy version and A'x as the message vector generating the 
codeword. Therefore, it is possible to recover A'x from y with 
probabiHty > {1 — P^). Note that the restriction on c arises 
from the fact that the rate of the linear code corresponding to 
G, i.e., 1/c, cannot exceed l~Hq{X), the capacity of the g-ary 
symmetric channel. The knowledge of A'x gives A"x since 
the rows of A" form a subset of the rows of A'. Thereafter, 
x can be recovered from A"x as described in the proof of 



Theorem III.l (via field lifting and syndrome decoding). ■ 
Note that ^(A") can be chosen as the parity check matrix 
of a Reed-Solomon code and A' can be designed to have A" 
as its sub-matrix. One family of linear codes that achieves 
small error probabilities over g-ary symmetric channel is 
concatenated codes |27|, [29]. For example, given A G (0, 1 — 
q^i), e e (0, 1 - iJg(A)), p e (0, 1) and large enough t e N, 
one can design a concatenated [N, K, D]q linear code with 
N = tq^^*'^ and K — \et\ [pgL«tJ"|^ whose decoding algorithm 
requires O(A^^logiV) operations in Fq,M |27|. Furthermore, 
the code achieves probability of error of at most q~'^i'^-p)^ 
over a g-ary symmetric channel with crossover probability A, 
where c(e, p) is a positive constant dependent only on q, e, p. 
We refer to this linear code as Ccon{t,e, p)- Designing G as 
the generator matrix of this code gives the following corollary: 

Corollary IV.2. Given A e (0, 1 - q-^), p G (0,1), e G 
(0, 1 — Hq{X)), s G N, s > \logq n\, it is possible to exactly 
recover x G Sqfrom y with probability > (1 — g"*^^*) (c > is 
dependent on q,e,p) for sufficiently large b,n using 0{{n + 
h\og{hs))bs^) operations in ¥q and R if n > 2b and A — 
GA', where G is the generator matrix of Ccon{\t*~\ , e, p), t* G 
M being the solution to epxq'^^ = Aqbs, and some set of 2bs 
rows of A.' forms A" such that ^^(A") is the parity check 
matrix of a [n, n — 2b, 2b + Reed-Solomon code. 



Proof: The recovery of x from y with probability 
> (1 — g^'^''^), for some constant c > 0, follows from 
Theorem IV. 1 with m" = 2b, the properties of codebook 
Ccon{\t*~\ , e, p) and the fact that the number of rows of A' is 
[e[i*]J [gLfr**lJl^ that is bounded below by 2bs and bounded 
above by 16g^+^6s, for large enough values of &, n. By 
property of the code Ccon{\t*^ , e, p), A'x is recoverable from 
y using 0(([rig<^rt*l)2iog(j-i*]^ert*l)) ^ 0{b^s^\og{bs)) 

operations in and R. The knowledge of A'x gives A"x 
since the rows of A" form a subset of the rows of A'. 
Thereafter, x can be obtained from A"x via field lifting 
and syndrome decoding using 0{nbs^) operations in F^, as 
described in the proof of Corollary |III.2| This amounts to a 
total of 0((7i + b\og{bs))bs^) operations in F^, M. ■ 
The above corollary suggests that one can design m = 
Q{blogqn) measurements (by choosing s = [log^?i]) for 
recovering any x G 5, in presence of q-my symmetric 
noise for large enough values of b,n, using 0{{n + 6 log 5 + 
&loglogri)6(logn)^) field operations in Fg,]R. This scaling 
of TO is order- wise optimal for b — 0(n"), a G [0, 1), with 



respect to the information-theoretic lower bound. Note that 
there are no theoretical guarantees in the context of real-valued 
compressing sensing for noise model; most of the guarantees 
are designed for Gaussian and £2-iiorm bounded noise. 

B. Worst-case Noise Model 

The worst-case noise model has been used for modeling 
corruption in storage media as well as channel transmission 
errors. Here, we assume that n comes from the following 
ensemble of signals with bounded number of non-zero entries: 

7V((5) = {n G F™ : wt(n) < (5to}, 0<S<1/2. 

For the sake of simplicity, we assume that Sm G N. Here, 
we recover x from y in two steps, similar to the recovery 
procedure for the probabilistic noise model. First, we eliminate 
the errors introduced by n using the error correction capability 
of linear codes, and obtain a noiseless compressed version of 
X. Next, we reconstruct x from this version using the approach 



described in Section III The following theorem holds here: 



Theorem IV.3. Given S G (0, (1 — q^^)/2), n coming from 
M{6), m > m' > m" s, to', to", s G N, it is possible to exactly 
recover x G Sqfrom y if A. — GA', where G is the generator 
matrix of a [m,m' ,d]q linear code, d > 2Sm, and some set 
of m" s rows of A! forms A" such that 0s(A") is the parity 
check matrix of a [n, n — to", d']qs linear code, d' > 2b. 

Proof Using the given form of A, we have y = GA'x + 
n. Since G is the generator matrix of a [to, to', d]q linear code, 
GA'x can be treated as a codeword and A'x can treated as the 
message vector generating it. Therefore, a decoding algorithm 
can be used to recover A'x from y, since wt(n) < Sm and 
the Hnear code can correct up to Sm errors. Note that the 
restriction S E {0, {1 — q^^) /2) arises due to Plotkin's bound 
for linear codes ll27l . The knowledge of A'x gives A"x since 
the rows of A" form a subset of the rows of A'. Thereafter, 
X can be recovered, as described in Theorem |III.1| ■ 
There exist families of linear codes with good minimum 
distance properties, like concatenated codes. In particular, 
given e, p G (0, 1) and large enough < G N, it is possible to 
design a concatenated [N, K, D]q linear code with N = tq^'-^^ , 
K = [et\ r/9gL«^*Jl, D > {H-^{1 - e))(l - p)tgL«tJ . We refer 
to this linear code as 2?con(^, e, p); also, its decoding process 
requires 0{N^ logiV) operations in Fg and M [27J. Designing 
G as its generator matrix results in the following corollary: 

Corollary IV.4. Given p G (0,1), S G (0, (1 - - 
y/p)/2), e = 1 - Hq{26/{1 - n coming from J\f{S), 

s G N, s > \logq n] , it is possible to exactly recover x G 5^ 
from y for sufficiently large b,n using 0{{n + b\og{bs))bs'^) 
operations in ¥q and R if n > 2b and A = GA', where G 
is chosen as the generator matrix of 'Dcon{\t*^,^T p), G K 
being the solution to tpxq^^ — -iqbs, and some set of 2bs 
rows of A' forms A" such that 0s (A") is the parity check 
matrix of a [n, n — 2b, 2b + 1]^^ Reed-Solomon code. 
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Proof: The recovery of x from y follows from The- 
3 with m" = 2b, the properties of codebook 
--..([i*! : ^7 p) ^nd the fact that the number of rows of 
A' is Ler^*lJ r?^'^**^^!' that is bounded below by 2bs and 
bounded above by IQq^^'^bs, for large values of b, n. By 
property of the codebook, A'x is recoverable from y in 
0(([i*]g^rt*l)2ioggi*^q4t*l)) = 0{b^s^log{bs)) opera- 
tions in Fg,M. The knowledge of A'x gives A"x since the 
rows of A" form a subset of the rows of A'. Then x can 
be obtained from A"x via syndrome decoding using 0{nbs^) 



recovery as the number of times exact recovery occurs divided 
by the number of sparse realizations chosen, i.e, 200. 



operations in Fg using Corollary III.2 This amounts to a total 
of 0{{n + b\og{bs))bs^) operations in and M. ■ 
The above corollary suggests that one can design m — 
Q{b\og^n) measurements (by choosing s = [logg?i]) for 
recovering any x G 5g in the worst-case noise setting for large 
enough &,n, using 0((n + 61og& + 61oglogrt)6(logn)^) field 
operations in F^, provided the fraction of corrupted symbols 
5 is < (1 — g~^)/2. This scaling of m is order-wise optimal 
for b = 0(n"), a £ [0, 1), with respect to the information- 
theoretic lower bound. Note that similar to probabilistic noise 
model, there are no theoretical guarantees for real-valued 
compressing sensing based on worst-case noise. 

V. Simulation Results 

In this section, we present simulation results showing the 
utility of our approach in the context of sensing and recovering 
sparse data, and tracking discrete-valued time series. 

Synthetic sparse data: We show the effect of different 
sparsity levels on our approach vs. real-valued compressive 
sensing using synthetically generated sparse data. We choose 
n — 1024 and four sparsity levels, b — \n^~\, r = 0.2, 
0.4, 0.6, 0.8. We set the number of linear measurements as 
m = 2 [6*5] \logg n], where 0.2 < 6* < 3 and g = 256. 

For real-valued compressive sensing, we take 200 realiza- 
tions of 6-sparse vector x G K", by choosing b positions of 
a 71-length zero vector uniformly at random and setting the 
enti-ies to 1 e M. We construct A G ]j™xn by choosing its 
entries in an i.i.d. fashion from the Gaussian distribution with 
mean and variance l/y^m. We reconstruct x, the estimate 
for X, using £i -minimization and define the error- free event 
as {||x — x||2/v^ < 10^^}. The probability of recovery is 
defined as the number of times the error-free event occurs 
divided by the number of sparse realizations, i.e, 200. 

For the algebraic approach, we take the same sparse vector 
realizations, but treat their entries as elements of F^, (0 is 
treated as the zero element of the field and 1 is treated as the 
identity element of F^). We compress and recover the sparse 
vectors using the approach described in Section [III] With 
s — \\ogq n], we choose sensing matrix A e pnxn ^^^^jj jj^^j. 
(/)s(A) is the parity check matrix of a [n, n—2[db\ , 2[9b\ +l]qs 
Reed-Solomon code; this ensures that the number of measure- 
ments is m = 2 [6b\ [log^ n] . We also define the error-free 
event as the case of exact recovery of x and the probability of 



Figure 2a shows the plot of 9 vs. probability of recovery. 
The fact that the number of measurements required for the 
algebraic approach is lesser compared to that of real-valued 
compressive sensing for sparsity levels b = \n^^, r = 0.2, 0.4, 
0.6, corroborates the remark made about sample complexity 



in Section III This also implies lesser storage space for 
measurements and sensing matrices for cases of low sparsity 
levels, since an element in ¥q can be represented in logj q = 8 
bits whereas reals are generally assigned more bits (assigning 
lesser bits would give quantization error as overhead). 

Note that the sufficient condition of m = 26 [log^ n] 
measurements for the algebraic approach implies the required 
number of measurements decreases as the field size increases. 
We demonstrate this fact in Figure |2b] where we consider 
compressive sensing setup over finite alphabet represented by 
Fg, g = 2\ i = 1,2,...,16, with n = 2048, 4096 and 
b = \n'"\, r = 0.2, 0.4, 0.6, 0.8. This figure shows the plot 
of log2 q (bit-resolution of F^) vs. m = 2b\\ogq n]. One can 
observe that the number of measurements saturates to 2b for 
large values of q, a lower bound on the sample complexity for 
differentiating between two 6-sparse vectors or signals. 

Tracking discrete-valued time series: The problem of the 
tracking time series is an important one and has been well- 
studied in literature ll30l . lISTl . In many situations, the changes 
in the variable associated with the time series is sparse, such 
as sequence of video frames and time series from human 
motion recognition or animation. As such, the concept of 
compressive sensing can be used - the idea is to compress 
the sparse changes in the variable to reduce the amount of 
memory needed for storing the time-series information. 

As an example, consider a real-valued time series that 
has been quantized to get a discrete-valued time series 
(zi, Z2, . . . , Zt), where € A" and C M is the discrete 
alphabet, with the property wt(ei) < 6, = z^+i — z^, 
i = 1,2, ...,t — 1, and 6 << n. Then one approach to 
compress the time series is to use real-valued compressive 
sensing - consider a sensing matrix A € M™^", satisfying 
some incoherence property and compress/track the discrete- 
valued time series as (zi, Aei, . . . , Aef_i). The decompres- 
sion algorithm comprises of recovering ei, 62, . . . , ef_i using 
^1 -minimization and getting the estimate of the discrete- 
valued time series. Another approach is the algebraic one 
- interpret A as finite field or subset of a finite field and 
perform compression using field operations, using the methods 



described in Section III Next, we provide simulation results 



for the tracking error of a synthetically generated quantized 
time series and a real promotion data based time series. 

We consider the parameter values n = 1024, t — 500 and 
sparsity levels 6 = [n*"] , r — 0.2, 0.4, 0.6, for generating the 
synthetic time series (xi, X2, . . . , Xf ), x^ e M", as follows. We 
construct xi by selecting its real-valued entries uniformly at 
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Fig. 2: Simulation results for (a),(b) synthetic sparse data, and (c),(d) discrete-valued time series. 



random from [—1, 1]. We construct from the n-length zero 
vector by selecting b random positions in and replacing the 
entries in those positions with uniformly selected real numbers 
from [—1, 1], i — 1,2, . . . ,t— 1. This gives us the desired time 
series with x^+i ^ x, + f,, i = 1, 2, . . . , t — 1. For quantizing 
this time-series, we use a simple approach - we choose the 
minimum and maximum values, say m and M respectively, 
an entry in the time series takes, and perform quantization in 
steps of 6 — {M — m)/q, q = 256. Therefore, the quantized 
time series, say (zi, Z2, . . . , zt), is based on a finite alphabet 
of size q. We define = z^+i — z^, i = 1,2, . . . ,t — 1, that 
are 6-sparse by construction, and set m = 2fe[log^ n]. 

For the real-valued compressive sensing approach, we 
perform compression and tracking of this quantized se- 
ries using a sensing matrix A e M™^", whose entries 
are i.i.d. entries from the Gaussian distribution with mean 
and variance l/y/m. We recover ei, 82, . . . , e4_i from 
Aei, Ae2, . . . , Aef_i using £1 -minimization, obtain esti- 



mates (ei, . . . , et_i) and estimate the quantized time series 
as (zi,Z2, . . . ,zt), Zj+i = Zj + Gj, i = 1,2, ... ,t - 1. We 
define the tracking error at time i as ||xi — Zi\\2/y/n. For the 
algebraic approach, we treat the quantization alphabet as 
and use the sensing matrix construction in Section |in] based 
on Reed-Solomon codes. Since m = 26[log^n], we have 
exact recovery in this case, so the tracking error at time i only 



2c 



comprises of the quantization error ||xi — z,H2/v^- Figure 
shows the plot for tracking error vs. time index for different 
sparsity levels. Note that this increasing nature of tracking 
error for real-valued compressive sensing is due to error 
propagation in the estimates of the time series; this includes 
both the quantization error as well as the error in determining 
the sparsity patterns of the changes in the time-series vector 
variable. Also, the tracking error reduces with increase in b, 
since m increases and approaches closer to the optimal number 
of measurements required for real-valued compressive sensing 
for error-free recovery of the sparse variable changes. 
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The promotional data time series comes from II32II . We 
make use of the 'promotions.dat' file containing a time series 
with vector-length as n = 1000 and number of time indices 
as 1000. The time index refers to the day index (so it is 3 
years of data) and the vector entries refer to the products. 
The entries in the time series come from {0, 1}, 1 means the 
product was promoted that day and means no promotion 
for the product. We first consider data for the first year 
for our simulations, t = 1 to t = 365. We refer to this 
time series as (zi, Z2, . . . , zj-) G {0,1}", T — 365. We 
observe that the Hamming distance between two successive 
vectors never exceeds 60. In other words, the changes in the 
support indices of successive vectors are 6-sparse, b = 60. 
We perform the same manner of tracking/compression as we 
do for the synthetic time series, the only difference being 
that the promotional time series is already discrete-valued and 
hence there is no need for quantization. We set the number of 
measurements as m — 26[log^ n] , g = 1024. For the algebraic 
approach, we treat {0, 1} as a subset of (0 is treated as the 
zero element of the field and 1 is treated as the identity element 
of the field) and use the sensing matrix construction in Section 
Since m — 26[log„ ti] , the tracking error is always zero for 
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the algebraic approach over F^, i.e., we have perfect tracking. 



Figure 2d shows the tracking error with increasing time index. 
Note that the tracking error for compressive sensing over reals 
increases with time due to error propagation in the estimates 
of the time series. We repeat the same procedure for time 
series data for the second and third years. Also, note that the 
algebraic approach requires lesser amount of storage space if 
a real number is assigned logj q = 10 bits or more. 

VI. Conclusion 

In this paper, we develop an algebraic framework for com- 
pressive sensing over finite alphabet; this bridges the areas of 
coding theory and compressive sensing in a thought-provoking 
way. In particular, it give us tools and constructive approaches 
for designing sensing matrices as well as polynomial-time- 
complexity algorithms for sparse source recovery, all while 
maintaining optimality in terms of sample complexity for 
exact recovery. Furthermore, we demonstrate that our ap- 
proach outperforms real-valued compressive sensing in terms 
of sample complexity and storage space if the sparsity level is 
below some threshold, and is extendible to the case of noisy 
measurements, with respect to a broad range of noise models. 
In terms of utility, finite alphabet compressive sensing proves 
to be a natural fit for the purpose of compressing/tracking 
discrete-valued time-series data with sparse changes. 



References 



[1 



R. Baranuik, "Compressive sensing," IEEE Signal Process. Magazine, 
vol. 24, pp. 118-121, Jul 2007. 
[2] R Parvaresh. H. Vikalo, S. Misra, and B. Hassibi, "Recovering 
sparse signals using sparse measurement matrices in compressed DNA 
microarrays," IEEE Sel. Topics in Signal Proc, vol. 2, 2008. 



[3 
[4 

[5 

[6: 

[7 
[8 
[9: 

[lo: 
[11 

[12 
[13 

[14; 

[15 

[16; 

[17 
[18 

[19; 
[2o; 

[21 

[22 

[23 

[24; 

[25 

[26; 

[27 
[28 

[29; 
[30; 
[31 
[32 



J. Romberg, "Imaging via compressive sampling," IEEE Signal Process. 
Magazine, vol. 25, pp. 14-20, Mar 2008. 

M. Duarte, M. Davenpoil, D. Takhar, J. Laska, T. Sun, K. Kelly, and 

R. G. Baraniuk, "Single-pixel imaging via compressive sampling," IEEE 

Signal Process. Magazine, vol. 25, pp. 83-91, Mar 2008. 

E. Candes and T. Tao, "Decoding by linear programming," IEEE Trans. 

on Information Theory, vol. 51, pp. 4203^215, Dec 2005. 

E. Candes and T. Tao, "Near optimal signal recovery from random 

projections: Universal encoding strategies?," IEEE Trans, on Information 

Theory, vol. 52, pp. 5406-5425, Dec 2006. 

D. Donoho, "Compressed sensing," IEEE Trans, on Information Theory, 
vol. 52, pp. 1289-1306, Apr 2006. 

R. Tibshirani, "Regression shrinkage and selection via the Lasso," Royal 
Statistical Society, vol. 58, pp. 267-288, 1996. 

S. Chen, D. Donoho, and M. Saunders, "Atomic decomposition by basis 
pursuit," SIAM Journal of Scien. Comp., vol. 20, pp. 33-61, 1998. 

E. Candes and T. Tao, "The Dantzig selector: Statistical estimation when 
p is much larger than n," Annals of Statistics, vol. 35, pp. 2392-2404. 
S. Sarvotham, D. Baron, and R. Baraniuk, "Sudocodes - fast measure- 
ment and reconstruction of sparse signals," IEEE ISIT, Jul 2006. 

D. Donoho and Y. Tsaig, "Fast solution of 11 -norm minimization 
problems when the solution may be sparse," Stanford University 
Department of Statistics Technical Report, 2006. 

J. Tropp and A. Gilbert, "Signal recovery from random measurements 
via orthogonal matching pursuit," IEEE Trans, on Information Theory, 
vol. 53, pp. 4655-4666, Dec 2007. 

E. Candes and J. Romberg, "Sparsity and incoherence in compressive 
sampling," Inverse Problems, vol. 23, pp. 969-985, 2007. 

M. Wainwright, "Sharp thresholds for noisy and high-dimensional re- 
covery of sparsity using 11 -constrained quadratic programming (Lasso)," 
IEEE Trans, on Information Theory, 2009. 

S. Draper and S. Malekpour, "Compressed sensing over finite fields," 
IEEE ISIT), pp. 669-673, Jul 2009. 

F. Zhang and H. Pfister, "Compressed sensing and linear codes over 
real numbers," IEEE ITA Workshop, pp. 558-561, Jan 2008. 

W. Ryan and S. Lin, "Channel codes: Classical and modem," Cambridge 
University Press, 2009. 

I. Csiszar, "Linear codes for sources and source networks: error 
exponents, universal coding," IEEE Trans, on Information Theory, vol. 
28, pp. 585-592, Jul 1982. 

W. Xu and B. Hassibi, "Efficient compressive sensing with deterministic 

guarantees using expander graphs," IEEE ITW, Sept 2007. 

S. Sarvotham, D. Baron, and R. Baranuik, "Compressed sensing 

reconstruction via belief propagation," Rice ECE Dept. Tech. Report, 

2006. 

A. Dimakis and P. Vontobel, "LP decoding meets LP decoding: a 
connection between channel coding and compressed sensing," IEEE 
AUerton, 2009. 

M. Cheraghchi, "Coding-theoretic methods for sparse recovery," IEEE 
AUerton, 2011. 

D. Sejdinovic A. Muller and R. Piechocki, "Approximate message 
passing under finite alphabet constraints," IEEE ICASSP, 2012. 

G. Leus Z. Tian and V. Lottici, "Detection of sparse signals under 
finite-alphabet constraints," IEEE ICASSP, 2009. 

T. Cover and J. Thomas, "Elements of information theory," Wiley- 
Interscience 2nd Edition, 2006. 

R. Roth, "Introduction to coding theory," Cambridge Univ. Press, 2006. 

H. Lou W. Zhong and J. Frias, "LDGM codes for channel coding and 
joint source-channel coding of correlated sources," EURASIP Journal 
on Signal processing, 2005. 

I. Dumer, "Concatenated codes and their multilevel generalizations," 
Handbook of Coding Theory, 1998. 

J. Li H. Chen and P. Mohapatra, "RACE: Time series compression with 
rate adaptivity and error bound for sensor networks," IEEE MASS, 2004. 
R. Xu O. Concha and M. Piccardi, "Compressive sensing of time series 
for human action recognition," ACM DICTA, 2010. 
Causality Workbench Team, "PROMO: Simple causal effects in time 
series," Aug. 2008. 



